home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Wildcat Gold - The Optical BBS
/
Wildcat Gold - The Optical BBS (The Golden ROM Series)(Volume 4 Number 1)(The Digital Publishing Company)(1992).ISO
/
sdn
/
nonlin10.sdn
/
NONLIN.DOC
< prev
next >
Wrap
Text File
|
1992-01-05
|
58KB
|
1,224 lines
N O N L I N
Nonlinear Regression Analysis Program
Version 1.0
Phillip H. Sherrod
A "shareware" program
Nonlin allows you to perform regression analyses to
estimate the values of parameters for linear,
multivariate, polynomial, and general nonlinear
functions. The regression analysis determines the
values of the parameters which cause the function to
best fit the observed data that you provide. Nonlin
allows you to specify the function whose parameters are
being estimated using ordinary algebraic notation. In
addition to determining the parameter estimates, Nonlin
can be directed to generate an output file with
predicted values and residuals. It can also plot the
data observations and the computed function.
NONLIN -- Nonlinear Regression Program Page 1
INTRODUCTION TO REGRESSION ANALYSIS
The goal of regression analysis is to determine the best
estimates of parameters for a function
depvar = f(p,indepvar)
where `depvar' is the dependent variable, `indepvar' is one or
more independent variables, and `p' is one or more parameters
whose values are to be estimated. In linear regression, the
function, f, is a linear (straight line) equation.
For example, if we assume the value of an automobile decreases by
a constant amount each year after its purchase, and for each mile
driven, the following linear function would predict its value
(the dependent variable) as a function of the two independent
variables which are age and miles:
value = price + depage*age + depmiles*miles
where `value', the dependent variable, is the value of the car,
`age' is the age of the car, and `miles' is the number of miles
that the car has been driven.
The regression analysis will determine the best values of the
three parameters, `price', the estimated value when age is 0,
`depage', the depreciation that takes place each year, and
`depmiles', the depreciation for each mile driven. The values of
`depage' and `depmiles' will be negative because the car loses
value as time and miles increase.
In a problem such as this car depreciation example, you must
provide a data file containing the values of the dependent and
independent variables for a set of observations. In this example
each observation record would contain three numbers: value, age,
and miles, collected from used car ads for the same model car.
The more observations you provide, the more accurate will be the
estimate of the parameters.
Once the values of the parameters are estimated, you can use the
formula to predict the value of a car based on its age and miles
driven. If a perfect fit existed between the function and the
actual data, the actual value of each car in your data file would
NONLIN -- Nonlinear Regression Program Page 2
exactly equal the predicted value. Typically, however, this is
not the case, and the difference between the actual value of the
dependent variable and its predicted value for a particular
observation is the error of the estimate which is known as the
"deviation" or "residual". The goal of regression analysis is to
determine the values of the parameters which minimize the sum of
the squared residual values for the set of observations. This is
known as a "least squares" regression fit.
INTRODUCTION TO NONLIN
Nonlin is a very powerful regression analysis program. Using it
you can perform multivariate, linear, polynomial, and general
nonlinear regression. What this means is that you specify the
form of the function to be fitted to the data, and the function
can include nonlinear terms such as variables raised to powers
and library functions such as log, exponential, sine, etc.
Nonlin uses a state-of-the-art regression algorithm that works as
well, or better, than any you are likely to find in commercial
statistical packages.
As an example of nonlinear regression, consider another
depreciation problem. The value of a used airplane decreases for
each year of its age. Assuming the value of a plane falls by the
same amount each year, a linear function relating value to age
is:
Value = p0 + p1*Age
Where `p0' and `p1' are the parameters whose values are to be
determined. However, it is a well known fact that planes (and
automobiles) lose more value the first year than the second, and
more the second than the third, etc. This means that a linear
(straight line) function cannot accurately model this situation.
A better, nonlinear, function is:
Value = p0 + p1*exp(-p2*Age)
Where the `exp' function is the value of e (2.7182818...) raised
to a power. This type of function is known as "negative
exponential" and is appropriate for modeling a value whose rate
of decrease is proportional to the difference between the value
and some base value. The F33YEAR.NLR example command file fits a
linear function to the value of used airplanes. The F33EXP.NLR
NONLIN -- Nonlinear Regression Program Page 3
example fits a negative exponential function to the same data.
Run both examples and compare the fitted functions. The
COOLING.NLR example also uses a negative exponential function.
Much of the convenience of Nonlin comes from the fact that you
can enter complicated functions using ordinary algebraic
notation. Examples of functions that can be handled with Nonlin
include:
Linear: Y = p0 + p1*X
Quadratic: Y = p0 + p1*X + p2*X^2
Multivariate: Y = p0 + p1*X + p2*Z + p3*X*Z
Exponential: Y = p0 + p1*exp(X)
Periodic: Y = p0 + p1*sin(p2*X)
Misc: Y = p0 + p1*Y + p2*exp(Y) + p3*sin(Z)
In other words, the function is a general expression involving
one dependent variable (on the left of the equal sign), one or
more independent variables, and one or more parameters whose
values are to be estimated.
Because of its generality, Nonlin can perform all of the
regressions handled by ordinary linear or multivariate regression
programs as well as nonlinear regression. However, in order to
handle nonlinear functions, Nonlin uses an iterative function
optimization algorithm which is slower than the simple linear
regression algorithm and has the potential for not converging to
a solution.
INSTALLING NONLIN
The NONLIN system consists of the following files:
NONLIN.EXE -- The executable program.
NONLIN.DOC -- Documentation file.
NONLIN.FON -- Font file used if you request a plot.
NONLIN.LJF -- HP LaserJet font file used if you print a plot.
*.NLR -- Example command files.
NONLIN -- Nonlinear Regression Program Page 4
To install Nonlin, copy the files into the directory of your
choice. If you do not plan to generated hard copy output for a
LaserJet printer, you may delete the NONLIN.LJF file. If the
NONLIN.FON and NONLIN.LJF files are not in your current
directory, you must place a command of the following form in your
AUTOEXEC.BAT file to tell Nonlin where to look for its font
files:
SET NONLIN=directory
Where "directory" is the name of the device and directory where
the files are located. For example, if the files are located in
a directory named NONLIN on the C disk, the following command
could be used:
SET NONLIN=C:\NONLIN
USING NONLIN
Once Nonlin has been installed, it can be started using a DOS
command of the form:
NONLIN command_file
where "command_file" is the name of a file containing Nonlin
commands that control the analysis. The sections that follow
describe these commands. If you omit the command file name,
Nonlin prints a list of its commands.
If you wish to direct the output produced by Nonlin to a file or
printer, use the DOS `>' redirection operator on the NONLIN
command line. For example, to process a command file named
LINEAR.NLR, directing output to a file named LINEAR.LST, use the
following command:
NONLIN LINEAR > LINEAR.LST
At this point, I suggest you pause in your reading and try
running a Nonlin example to get a feel for how it works. Several
example files with the extension ".NLR" are provided with the
distribution. LINEAR.NLR is a good one to start with. If you do
not have a graphics monitor, edit the LINEAR.NLR command file
(and other example files) and remove the PLOT command.
NONLIN -- Nonlinear Regression Program Page 5
FUNCTION SPECIFICATION
Much of the power of Nonlin comes from its ability to estimate
the value of parameters that are part of complicated functions
that you enter in ordinary algebraic form. This section explains
the arithmetic operators and built in functions that are used to
specify a function.
Arithmetic Operators
The following arithmetic operators may be used in expressions:
+ addition
- subtraction or unary minus
* multiplication
/ division
** or ^ exponentiation
Exponentiation has the highest precedence, followed by
multiplication and division, and then addition and subtraction.
Parentheses may be used to group terms.
As a convenience, Nonlin allows you to omit the multiplication
operator between a numeric constant and a following variable,
parameter, or function. For example, the expressions "2pi", and
"2 pi" are equivalent to "2*pi". Similarly, "5X" is equivalent
to "5*X". However, if you specify a number before the letter
"E", it will be taken as the exponential form of a number (see
below) rather than the number times the constant E (base of
natural logarithms).
Numeric Constants
Numeric constants may be written in their natural form (1, 0,
1.5, .0003, etc.) or in exponential form, n.nnnEppp, where n.nnn
is the base value and ppp is the power of ten by which the base
is multiplied. For example, the number 1.5E4 is equivalent to
15000. All numbers are treated as "floating point" values,
regardless of whether a decimal point is specified or not. As a
convenience for entering time values, if a value contains one or
more colons, the portion to the left of the colon is multiplied
by 60. For example, 1:00 is equivalent to 60; 1:00:00 is
equivalent to 3600.
NONLIN -- Nonlinear Regression Program Page 6
Symbolic Constants
There are two numeric constants that may be specified using
symbolic names. The symbolic name "PI" is equivalent to the
value of pi, 3.14159... Similarly, the symbolic constant "E" is
equivalent to the base of natural logarithms, 2.7182818...
Built in Functions
The following functions are built into Nonlin and may be used in
expressions:
ABS(x) -- Absolute value of x.
ACOS(x) -- Arc cosine of x. Angles are measured in radians.
ASIN(x) -- Arc sine of x. Angles are measured in radians.
ATAN(x) -- Arc tangent of x. Angles are measured in radians.
J0(x) -- Bessel function of the first kind, order zero.
J1(x) -- Bessel function of the first kind, order one.
JN(n,x) -- Bessel function of the first kind, order n.
COS(x) -- Cosine of x. Angles are measured in radians.
COSH(x) -- Hyperbolic cosine of x.
COT(x) -- Cotangent of x. (COT(x) = 1/TAN(x)).
CSC(X) -- Cosecant of x. (CSC(x) = 1/SIN(x)).
DEG(x) -- Converts an angle, x, measured in radians to the
equivalent number of degrees.
EXP(x) -- e (base of natural logarithms) raised to the x power.
FAC(x) -- x factorial (x!). Note, the FAC function is computed
using the GAMMA function (FAC(x)=GAMMA(x+1)) so
non-integer argument values may be computed.
NONLIN -- Nonlinear Regression Program Page 7
GAMMA(x) -- Gamma function. Note, GAMMA(x+1) = x! (x factorial).
GAMMAI(x) -- Reciprocal of GAMMA function (GAMMAI(x) =
1/GAMMA(x)).
HAV(x) -- Haversine of x. (HAV(x) = (1-COS(x))/2).
LOG(x) -- Natural logarithm of x.
LOG10(x) -- Base 10 logarithm of x.
MAX(x1,x2) -- Maximum value of x1 or x2.
MIN(x1,x2) -- Minimum value of x1 or x2.
NORMAL(x) -- Normal probability distribution of x. X is in units
of standard deviations from the mean.
PULSE(a,x,b) -- Pulse function. If the value of x is less than a
or greater than b, the value of the function is 0. If
x is greater than or equal to a and less than or equal
to b, the value of the function is 1. In other words,
it is 1 for the domain (a,b) and zero elsewhere. If
you need a function that is zero in the domain (a,b)
and 1 elsewhere, use the expression (1-PULSE(a,x,b)).
RAD(x) -- Converts an angle measured in degrees to the equivalent
number of radians.
SEC(x) -- Secant of x. (SEC(x) = 1/COS(x)).
SEL(a1,a2,v1,v2) -- If a1 is less than a2 then the value of the
function is v1. If a1 is greater than or equal to a2,
then the value of the function is v2.
SIN(x) -- Sine of x. Angles are measured in radians.
SINH(x) -- Hyperbolic sine of x.
SQRT(x) -- Square root of x.
STEP(a,x) -- Step function. If x is less than a, the value of
the function is 0. If x is greater than or equal to a,
the value of the function is 1. If you need a function
which is 1 up to a certain value and then 0 beyond that
NONLIN -- Nonlinear Regression Program Page 8
value, use the expression STEP(x,a).
T(n,x) -- Chebyshev polynomial of order n.
TAN(x) -- Tangent of x. Angles are measured in radians.
TANH(x) -- Hyperbolic tangent of x.
Y0(x) -- Bessel function of the second kind, order zero.
Y1(x) -- Bessel function of the second kind, order one.
YN(n,x) -- Bessel function of the second kind, order n.
NONLIN -- Nonlinear Regression Program Page 9
NONLIN COMMAND FILES
The commands described in this section are placed in a command
file. When you start Nonlin, you specify the name of the command
file as a parameter on the command line. For example, if the
command file name is CAR.NLR, the following command would cause
Nonlin to execute the commands in the command file:
NONLIN CAR.NLR
If you do not specify a file name extension for the command file,
".NLR" is used by default. Command files can be created using a
text editor such as EDIT-32, EDLIN, the DOS version 5 EDIT
program, or any other editor or word processor that is capable of
creating an ascii text file without formatting codes.
Comments may be placed in command files by preceding the comment
with an exclamation point. Entire lines may be used for comments
and comments can be placed at the end of commands.
Command lines can be continued by placing a semicolon character
as the last non-blank character on the line (a comment may follow
the semicolon) and then continuing the command on the following
line(s).
Every command file must contain the following commands:
VARIABLES, PARAMETERS, FUNCTION, and DATA. The DATA statement
introduces the data for the analysis and must be the last command
in the file (data records may follow it). Other, optional,
commands may be interspersed in the command file.
The following is an example of a complete command file:
VARIABLES VALUE,AGE,MILES
PARAMETERS BASE,DEPAGE,DEPMILES
FUNCTION VALUE = BASE + DEPAGE*AGE + DEPMILES*MILES
DATA
(data records follow)
NONLIN -- Nonlinear Regression Program Page 10
NONLIN COMMANDS
The following is a list of the valid Nonlin commands that can be
placed in a Nonlin command file. Command keywords may be
abbreviated to the first three letters. Nonlin commands are not
case sensitive.
TITLE string (optional) -- Specifies a title line that is printed
with the results of the analysis.
VARIABLES var1,var2,... (required) -- Specifies the names of the
variables that will be used in the function. The
dependent variable and the independent variables must be
specified. The order of the variable names must match
the order of the data values for each observation. You
may define more variables than you actually use in the
function specification. A maximum of 12 variables may be
specified. The length of a variable name is limited to
10 characters. Capitalize the variable names as you want
them displayed in the results.
If you wish to assign a weight to the observations (so
some observations are considered more significant than
others), use $WEIGHT as the name of the weight variable.
You may specify all of the variables on a single command
line (which may be continued), or you may have multiple
VARIABLES commands. If you use multiple commands, the
order in which they appear in the command file must match
the order of the variable values for each observation.
The VARIABLES command must precede the FUNCTION command.
PARAMETERS param1[=initial1],param2[=initial2],... (required) --
Specifies the names of the parameters whose values are to
be determined by Nonlin. Nonlin is capable of handling
up to 12 parameters. The parameter names may not exceed
10 characters in length. Do not specify any parameters
that are not used in the function. The PARAMETERS
command must precede the FUNCTION command.
Optionally, an initial estimate of the parameter value
may be specified by following the parameter name with an
equal sign and the value. If no value is specified, 1 is
used by default. Specifying an initial value that is
near the actual value usually speeds up the operation of
NONLIN -- Nonlinear Regression Program Page 11
Nonlin and may enable it to successfully converge to a
solution. If Nonlin is unable to converge to a solution,
try specifying different starting values for the
parameters. Try to specify a value that at least has the
correct sign as the expected final value.
The CONSTRAIN command (described below) can be used to
limit the range of values for parameters. The SWEEP
command can be used to perform the regression analysis
with a range of parameter initial values.
CONSTRAIN parameter=lowvalue,highvalue (optional) -- Specifies a
lower and upper limit on the range of a parameter value.
During the solution process, Nonlin may allow a
parameter's value to temporarily move in a direction away
from its final value. With some functions it may be
necessary to constrain the parameter's value so that it
does not go negative (e.g., if the function takes the
square root of the parameter), or zero (if the parameter
is in a denominator).
Only a single parameter and its associated limits may be
specified on each CONSTRAIN command, but you may use
multiple CONSTRAIN commands. The PARAMETERS command must
precede the CONSTRAIN command.
The parameter value is allowed to range from `lowvalue'
to `highvalue'. If you want to prevent a parameter value
from going to zero, you must specify a value greater than
zero for the low value (specifying zero would allow it to
reach, but not go below, zero). For example, the
following command constrains the value of `age' to be
greater than zero and less than or equal to 100:
CONSTRAIN age = .0001,100
See the COOLING.NLR, F33EXP.NLR, and POWER.NLR files for
examples of the CONSTRAIN command.
SWEEP parameter=lowvalue,highvalue,stepsize (optional) --
Specifies that the regression analysis is to be performed
repeatedly with a set of starting values for the
parameter. The first analysis is performed with the
parameter having the `lowvalue'; the value of `stepsize'
is then added to the parameter's initial value and the
NONLIN -- Nonlinear Regression Program Page 12
analysis is performed again. The process is repeated
until the value of the parameter reaches `highvalue'.
Each time the analysis is performed the value of the
residual sum of squares is compared with the best
previous result. The estimated values of the parameters
for the best starting value are saved and used for the
final analysis and report.
Only one parameter may be specified on each SWEEP
command, but you may have as many SWEEP commands as there
are parameters. The number of regression analyses
performed will be equal to the product of the number of
parameter values for each SWEEP command.
The SWEEP command is useful when you are trying to fit a
complicated function that may have "local minimum" values
other than the "global minimum". Periodic functions
(sin, cos, etc.) are especially troublesome.
See the SINE.NLR command file for an example of the SWEEP
command.
FUNCTION depvar = function (required) -- Specifies the form of
the function whose parameters are to be determined. The
dependent variable must be the only thing to the left of
the equal sign. The expression to the right of the equal
sign may contain variables, parameters, constants,
operators, and library functions such as sqrt, sin, exp,
etc. The VARIABLES and PARAMETERS commands must have
appeared in the command file before the FUNCTION command,
and all variables and parameters used in the function
must have been specified on those commands. Some example
FUNCTION commands are show below:
FUNCTION Y = P0 + P1*X
FUNCTION DISTANCE = .5 * ACCEL * TIME^2
FUNCTION VALUE = PRICE + YRDEP*AGE + MILEDEP*MILES
FUNCTION POPULATN = BASE * GROWRATE * EXP(TIME)
NONLIN -- Nonlinear Regression Program Page 13
TOLERANCE value (optional, default=1E-10) -- Specifies the
tolerance factor that is used to determine when the
algorithm has converged to a solution. Reducing the
tolerance value may produce a slightly more accurate
result but will increase the number of iterations and the
running time.
ITERATIONS value (optional, default=50) -- Specifies the maximum
number of iterations that should be attempted by the
algorithm. If the solution does not converge to the
limit specified by the TOLERANCE command (or to the
default tolerance) before the maximum number of
iterations is reached, the process is stopped and the
results are printed. Failure to converge before the
specified number of iterations could be caused by one of
three things:
1. The maximum allowed number of iterations may be too
small. Try using an ITERATIONS command with a larger
value.
2. The tolerance factor may be too small. Even a
properly converging solution will eventually "level off"
or oscillate around a good, but non-zero, sum of squares
value. Try using the TOLERANCE command to increase the
tolerance value.
3. The function may not be converging. Try specifying
better (or at least different) starting values for the
parameters on the PARAMETERS command. Consider using the
SWEEP command to specify a range of parameter starting
values.
REGISTER (optional) -- The REGISTER command suppresses the
copyright and registration message that is normally
printed as part of a Nonlin report. The use of this
command is a reminder that you should register your use
of Nonlin.
OUTPUT [TO file] var1,var2,... (optional) -- Specifies that after
the analysis is completed, data values are to be printed
or written to a file. If the "TO file" portion of the
command is specified, the output is written to the
specified file. If this portion of the command is
omitted, the output values are printed along with the
NONLIN -- Nonlinear Regression Program Page 14
results. If a file name is specified without an
extension, ".OUT" is used by default.
The list of variable names determines which variables are
written to the file and the order in which the values
appear in each output record. Any variable previously
declared on a VARIABLES command may be specified. In
addition, the folowing special variable names may appear
in the output list:
$OBS -- The observation record number, starting at 1 and
increasing by 1.
$PREDICTED -- The predicted value for the dependent
variable for the observation, given the independent
variable values and the parameters as calculated by the
analysis.
$RESIDUAL -- The difference between the actual value of
the dependent variable and its predicted value.
Examples of OUTPUT commands are shown below:
OUTPUT AGE,MILES,VALUE,$PREDICTED,$RESIDUAL
OUTPUT TO GROWTH.DAT $OBS,TIME,POPULATN,$PREDICTED
PLOT [options] -- Display a plot of the calculated function and
the data observations. The PLOT command may be used only
if there is a single independent variable (multiple
independent variables would require an n-dimensional
surface plot); however, there is no restriction on the
number of parameters being estimated. You must have a
CGA, EGA, or VGA monitor to use the PLOT command, and the
NONLIN.FON font file must be in the current directory or
in a directory specified by the NONLIN environment
variable. In the plot, the data values you provided are
shown as blue X's and the function fitted to the data by
Nonlin is shown as a solid green line. Press Return to
proceed with the analysis after you have finished looking
at the plot.
NONLIN -- Nonlinear Regression Program Page 15
The following four options may be specified on the PLOT
command:
GRID -- display grid lines to make it easier to estimate
values.
RESIDUAL -- draw vertical lines from each observed data
point to the corresponding point on the calculated
function line. These lines represent the "residual"
value that Nonlin is attempting to minimize.
ITERATION -- draw a plot for each iteration of the
regression analysis. Normally, the plot is drawn after
the analysis has converged to a solution; you may use the
ITERATION option to observe the function during each
iteration of the analysis as it converges to fit the
data.
PRINT -- print a copy of the plot on an HP LaserJet
printer. Nonlin writes the plot to the PRN device which
much be attached to an HP Series II or Series III
printer. The NONLIN.LJF font file must be in the current
directory or in a directory specified by the NONLIN
environment variable.
The option keywords may be abbrievated to their first
letter. If more than one option is specified, separate
them with commas. For example, to produce a plot with
both grid lines and residual lines, use the following
command:
PLOT GRID,RESIDUAL
DATA [file] (required) -- Specifies the name of the file
containing the data records, or introduces the data
records which follow the command. If a file name is
specified on the DATA command, the file is opened, its
data records are read, and the regression analysis is
performed. If a file name is specified without an
extension, ".DAT" is used by default.
If no file name is specified on the DATA command, the
data records must immediately follow the DATA command in
the command file.
NONLIN -- Nonlinear Regression Program Page 16
Each data record must contain at least as many data
values as the number of variables specified on the
VARIABLES command(s). The order of the variables as
specified on the VARIABLES command must match the order
of the values in each observation. Any data values
beyond those required for the specified variables are
ignored. Each observation must begin on a new line.
The data values must be separated by one or more spaces
and/or a comma. Data values may contain decimal points
and may be expressed in exponential notation
(i.e., n.nnnnEppp). As a convenience for entering time
values, if a value contains one or more colons, the
portion to the left of the colon is multiplied by 60.
For example, 1:00 is equivalent to 60; 1:00:00 is
equivalent to 3600.
You may continue data lines by specifying a semicolon as
the last non-blank character on a record and then placing
the continuation value on the following line(s).
The DATA command must be the last command in the command
file. If no file name is specified on the DATA command,
the data records must immediately follow the DATA command
in the command file.
The following is an example of a complete command file
including data records:
VARIABLES AGE,MILES,VALUE
PARAMETERS BASE,DEPAGE,DEPMILES
FUNCTION VALUE = BASE + DEPAGE*AGE + DEPMILES*MILES
DATA
2 10000 13000
4 42000 9000
1 7000 17000
6 52000 6000
5 48000 8000
If the data records had been placed in a separate file
named CAR.DAT, the DATA statement would be changed to
"DATA CAR.DAT".
NONLIN -- Nonlinear Regression Program Page 17
UNDERSTANDING THE RESULTS
Nonlin prints a variety of statistics at the end of each
analysis. For each variable, Nonlin lists the mean value, the
minimum value, and the maximum value. You should confirm that
these values are within the ranges you expect.
For each parameter, Nonlin displays the initial parameter
estimate and the final estimate. The final estimate values are
the results of the analysis. By substituting these values in the
equation you specified to be fitted to the data, you will have a
function that can be used to predict the value of the dependent
variable based on a set of values for the independent variables.
For example, if the equation being fitted is
y = p0 + p1*x
and the final estimates are 1.5 for p0 and 3 for p1, then the
equation
y = 1.5 + 3*x
is the best equation of this form that will predict the value of
y based on the value of x.
In addition to the variable and parameter values, Nonlin displays
several statistics that indicate how well the equation fits the
data. The "Final sum of squared deviations" is the sum of the
squared differences between the actual value of the dependent
variable for each observation and the value predicted by the
function, using the final parameter estimates.
The "Average deviation" is the average over all observations of
the absolute value of the difference between the actual value of
the dependent variable and its predicted value.
The "Maximum deviation for any observation" is the maximum
difference (ignoring sign) between the actual and predicted value
of the dependent variable for any observation.
The "Proportion of variance explained (r^2)" indicates how much
better the function predicts the dependent variable than just
using the mean value of the dependent variable. It is computed
as follows: Suppose that we did not fit an equation to the data
and ignored all information about the independent variables in
NONLIN -- Nonlinear Regression Program Page 18
each observation. Then, the best prediction for the dependent
variable value for any observation would be the mean value of the
dependent variable over all observations. The "variance" is the
sum of the squared differences between the mean value and the
value of the dependent variable for each observation. Now, if we
use our fitted function to predict the value of the dependent
variable, rather than using the mean value, a second kind of
variance can be computed by taking the sum of the squared
difference between the value of the dependent variable predicted
by the function and the actual value. Hopefully, the variance
computed by using the values predicted by the function is better
(i.e., a smaller value) than the variance computed using the mean
value. The "Proportion of variance explained" is computed as
1 - (variance using predicted value / variance using mean). If
the function perfectly predicts the observed data, the value of
this statistic will be 1.00 (100%). If the function does no
better a job of predicting the dependent variable than using the
mean, the value will be 0.00.
THEORY OF OPERATION
The basis for the minimization technique used by Nonlin is to
compute the sum of the squared residuals for one set of parameter
values and then slightly alter each parameter value and recompute
the sum of squared residuals to see how the parameter value
change affects the sum of the squared residuals. By dividing the
difference between the original and new sum of squared residual
values by the amount the parameter was altered, Nonlin is able to
determine the approximate partial derivative with respect to the
parameter. This partial derivative is used by Nonlin to decide
how to alter the value of the parameter for the next iteration.
If the function being modeled is well behaved, and the starting
value for the parameter is not too far from the optimum value,
the procedure will eventually converge to the best estimate for
the parameter. This procedure is carried out simultaneously for
all parameters and is, in fact, a minimization problem in
n-dimensional space, where `n' is the number of parameters.
NONLIN -- Nonlinear Regression Program Page 19
HINTS FOR NONLIN USE
Convergence Failures
One of the potential problems that confronts any nonlinear
minimization procedure is that of non-convergence.
Non-convergence is usually not a problem for regressions using a
linear model, but becomes a more serious consideration when using
complicated nonlinear functions; increasing the number of
parameters aggravates the problem.
Non-convergence can occur in two ways: the solution may diverge
or it may converge to the wrong solution -- a local minimum
rather than the global minimum. Periodic functions, such as sin,
and cos, are particularly prone to convergence problems. For
example, consider a nonlinear regression performed with the
function:
y = offset + amplitude * sin(frequency * x)
where x and y are variables, and offset, amplitude, and frequency
are the parameters whose values are to be determined. If the
starting value for frequency is not reasonably close to the
correct value, the solution may converge to a harmonic (multiple)
or subharmonic (fundamental) value of the frequency. A command
file named SINE.NLR is supplied with the commands and data to
perform this analysis.
The SWEEP command can be very useful in cases like the sine
example. In the SINE.NLR example analysis, the actual value of
the frequency is 3; the function converges to the correct
solution if the starting value is in the range 2.6 to 3.3.
However, this example is quite insensitive to the starting value
of the amplitude parameter. With an actual value of 2, the
correct solution is found with starting values from 1 through
10000. Similarly, the offset parameter, which had an actual
value of 10, was successfully determined with starting values
ranging from 1 to over 50000.
Another example which is sensitive to a parameter starting value
is POWER.NLR which attempts to determine the values of the
parameters p0, p1, and p2 for the function
y = p0 + p1*x^p2
NONLIN -- Nonlinear Regression Program Page 20
(where "x^p2" means x raised to the p2 power). The actual value
of p2 in the example data is 2; the solution converges correctly
if the starting value of p2 is in the range 1.8 to 3.8. As with
the other example, the solution is relatively insensitive to the
starting values of p0 and p1.
Singular Matrix Problems
Another possible problem is that the analysis may stop with the
message "Singular convergence. Mutually dependent parameters?".
This is usually due to one of two things: (1) a redundant
parameter that is co-dependent with another parameter, or (2) a
situation where the value of one parameter "blocks" the effect of
other parameters.
As an example of a redundant parameter, consider the function
y = p0 + p1*p2*x
This is a simple linear equation except there are two parameters,
p1, and p2, which are both factors to the variable x. It should
be clear that there is no unique solution to this problem since
any value of p1 is possible if the right value of p2 is chosen.
Similarly, the function
y = p0 + p1 + p2*x
has no unique solution since either p0 or p1 is redundant.
The second type of singular matrix problem can be illustrated by
the function
y = p0 + p1*x^p2
If, during the solution process, p1 takes on the value 0, then
varying the value of p2 has no effect on the equation and Nonlin
cannot figure out which way to change the value of p2 to move
toward convergence. The solution to this problem is to assign a
starting value that is not zero to p1, and use the CONSTRAIN
command to force p1 to remain non-zero.
NONLIN -- Nonlinear Regression Program Page 21
PERFORMANCE ISSUES
Nonlin is carefully programmed and compiled with an optimizing
compiler for maximum performance. However, Nonlin is a real
"number cruncher," and the nonlinear regression algorithm is
mathematically very elaborate. During each iteration, Nonlin
computes gradients, Jacobians, Hessians, and eigenvalues, and
performs QR and Cholesky matrix decompositions. All calculations
are carried out using double precision (64 bit) floating point.
Nonlin does not require an 80x87 numeric coprocessor, but its
performance is greatly enhanced if one is present. In fact, an
8088 CPU with an 8087 numeric coprocessor can perform regression
analyses faster than a 20 MHz 80386 that does not have a
coprocessor. If you have an 8088 without a coprocessor, be
patient -- Nonlin is probably giving it the workout of its life.
Very long running times can result if you use the SWEEP command
with many starting values. The problem is compounded if you have
multiple SWEEP commands. If you use the SWEEP command to try a
large number of starting parameter values, you can save time by
using the ITERATIONS command to specify a small number of
iterations (such as 5) during the initial attempt to find a
solution. Once a feasible set of starting parameter values has
been determined, remove the SWEEP command, specify the starting
values on the PARAMETERS command, increase the number of
iterations, and rerun the analysis to get the final result.
PROGRAM LIMITS
The following is a summary of the Nonlin program limitations:
Maximum number of variables = 12
Maximum number of parameters = 12
Maximum length of variable or parameter names = 10
The maximum number of data observations that Nonlin can handle
depends on the number of parameters as shown by the table that
follows:
NONLIN -- Nonlinear Regression Program Page 22
# Parameters Max Observations
1 2019
2 1611
3 1339
4 1144
5 997
6 883
7 791
8 715
9 652
10 599
EXAMPLE ANALYSES
A number of example regression analysis files are provided with
your Nonlin distribution. All of the example command files have
the extension ".NLR". Some of the important ones are described
below, others contain comment lines that explain what they do.
LINEAR.NLR -- Simple linear regression with plotted function and
data.
QUAD.NLR -- Fit a quadratic equation. Plot the function and the
data.
ASYMPTOT.NLR -- Fit an asymptotic function Y = 12 - 10/X.
F33.NLR -- Multivariate linear regression. Calculate the value
of a used Beech F33 Bonanza airplane based on its age,
the number of hours on its airframe, and the number of
hours on its engine.
F33YEAR.NLR -- Similar to F33.NLR except the price of the Bonanza
is calculated based only on the age. The function is
plotted.
F33EXP.NLR -- Similar to F33YEAR.NLR except a negative
exponential function is used rather than a linear
function.
SINE.NLR -- Fit an equation involving a sin function. The SWEEP
command is used to find a starting point that will
converge.
NONLIN -- Nonlinear Regression Program Page 23
COOLING.NLR -- Fit an equation involving an exponential function.
If a heated object is allowed to cool, the rate of
cooling at any instant is proportional to the
difference between the object's temperature and ambient
(room) temperature. The function that relates the
object's temperature to time is:
Temperature = Roomtemp+InitTemp*exp(-Coolrate*Time)
Where InitTemp is the temperature at time 0, and
Coolrate is a factor that depends on the mass of the
object, how well it is insulated, etc. The exp
function is the value of e (2.7182818...) raised to a
power. The COOLING.NLR example determines the
parameters InitTemp and Coolrate to fit an equation of
this form to some data the author collected.
MAGNET.NLR -- Fit a function involving an arc tangent and a
variable to the third power. This is an interesting
physics problem. If a magnet is placed due east of a
compass, the deflection of the compass needle from
north is equal to the arc tangent of the ratio of the
strength of the magnet's field relative to the earth's
magnetic field. The strength of the magnet's field at
the compass is inversely proportional to the cube of
the distance from the magnet to the compass. Thus, the
function relating these terms is
Deflection = deg(atan(Strength / Distance ^ 3))
The deg function converts an angle in radians to
degrees. In the example, Deflection and Distance are
the variables, and the value of the Strength parameter
is determined.
DIODE.NLR -- The current through a diode increases sharply as the
voltage across the diode is increased. An equation
that approximates the current flow as a function of the
voltage is:
I = a*exp(b*(V-c))
where `I' is the current, `V' is the voltage, and `a',
`b', and `c' are parameters that are to be estimated by
the nonlinear regression.
NONLIN -- Nonlinear Regression Program Page 24
ACKNOWLEDGEMENT
The nonlinear regression algorithm used by Nonlin was published
in ACM Transactions on Mathematical Software 7,3 (Sept. 1981)
"Dennis, J.E., Gay, D.M., and Welsch, R.E. -- An adaptive
nonlinear least-squares algorithm."
USE AND DISTRIBUTION OF NONLIN
Nonlin is a "shareware" product. You are welcome to make copies
of this program and pass them on to friends or post this program
on bulletin boards.
However, if you find Nonlin to be useful and/or entertaining you
are expected to compensate the author by sending the registration
form printed on a following page with $20 to help cover the
development and support of Nonlin. In return, you will receive
the most recent version of the program along with a bound manual.
Specify the type of disk you wish to receive. Add $5 if Nonlin
is being shipped out of the United States.
See also the special offer that follows involving the Mathplot
program.
You are welcome to write to the author:
Phillip H. Sherrod
4410 Gerald Place
Nashville, TN 37205-3806
Both the Nonlin program and documentation are copyright (c) 1992
by Phillip H. Sherrod. You are not authorized to modify the
program. "Nonlin" is a trademark.
Disclaimer
Nonlin is provided "as is" without warranty of any kind, either
expressed or implied. This program may contain "bugs" and
inaccuracies, and its results should not be assumed to be correct
unless they are verified by independent means. The author
assumes no responsibility for the use of Nonlin and will not be
responsible for any damage resulting from its use.
NONLIN -- Nonlinear Regression Program Page 25
M A T H P L O T
Mathematical Function Plotting Program
Special Offer
If you like Nonlin, you should check out the Mathplot program by
the same author.
Mathplot allows you to specify complicated mathematical functions
using ordinary algebraic expressions and immediately plot them.
Four types of functions may be specified: cartesian (Y=f(X));
parametric cartesian (Y=f(T) and X=f(T)); polar
(Radius=f(Angle)); and parametric polar (Radius=f(T) and
Angle=f(T)). Up to four functions may be plotted simultaneously.
Scaling is automatic. Options are available to control axis
display and labeling as well as grid lines. Hard copy output may
be generated as well as screen display. Mathplot is an ideal
tool for engineers, scientists, math and science teachers, and
anyone else who needs to quickly visualize mathematical
functions.
SPECIAL OFFER
Registered users of Nonlin can order Mathplot for a special price
of $18. Or, for an even better deal, if you register Nonlin and
order Mathplot at the same time, you can get both for $36.
NONLIN -- Nonlinear Regression Program Page 26
=====================================================================
Software Order Form
=====================================================================
NAME ______________________________________________________
ADDRESS ___________________________________________________
CITY _______________________ STATE _______ ZIP ___________
TELEPHONE _________________________________________________
NONLIN VERSION (on title page) ____________________________
BULLETIN BOARD WHERE YOU FOUND NONLIN _____________________
COMMENTS __________________________________________________
Check the box below which indicates your order type:
___ I wish to register Nonlin ($20).
___ I wish to order Mathplot ($20).
___ I wish to register Nonlin and order Mathplot ($36).
Add $5 to any amount shown above if the software is being shipped
out of the United States.
In return for registering, you will receive the most recent
version of the program and a bound copy of the manual.
Distribution disk choice (check one):
3.50" HD (1.4 MB) ______
5.25" HD (1.2 MB) ______
5.25" DD (360 KB) ______
Send this form with the amount indicated to the author:
Phillip H. Sherrod
4410 Gerald Place
Nashville, TN 37205-3806